Collaborative Filtering and Heavy-Tailed Degree Distributions

نویسندگان

  • Maurizio Calo Caligaris
  • Rafael Moreno Ferrer
چکیده

Common techniques in collaborative filtering rely on finding low-rank matrix approximations to the adjacency matrix (ratings that users assign to items), essentially representing users and items as a collection of a small number of latent features. One issue that arises in many real world datasets for collaborative filtering is that the number of observed entries per row/column follows a heavy-tail distribution. For instance, in the Amazon product ratings dataset, the maximum degree of a product is 12180 whereas the average number of ratings for each product is 4.68. We show that these over-represented rows/columns alter the spectrum of the adjacency matrix significantly, which negatively affects the performance of SVD based methods for low-rank approximations. Further, we present experimental evaluation to show that discarding rows/columns with high degree results in improved performance accross several different datasets (Amazon products ratings, movie ratings and book ratings).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

نمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر

Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...

متن کامل

Notes : Social Networks : Models , Algorithms , and Applications

We specifically discussed the power law degree distribution which has degree distribution pk = C ·k−α where C is a constant and α > 1. While not all degree distributions will be power law, many of the degree distributions of observed networks will be heavy-tailed or long-tailed distributions. A heavy-tailed distribution is a distribution that is “heavier” than the exponential distribution. Here...

متن کامل

A NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM

Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...

متن کامل

Modeling and Analysis of Heavy-tailed Distributions via Classical Teletraac Methods

We propose a new methodology for modeling and analyzing heavy-tailed distributions, such as the Pareto distribution, in communication networks. The basis of our approach is a tting algorithm which approximates a heavy-tailed distribution by a hyperexponential distribution. This algorithm possesses several key properties. First, the approximation can be achieved within any desired degree of accu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013